Band selection can effectively reduce the spatial redundancy of hyperspectral data and provide effective support for subsequent classification. Multi-kernel fuzzy rough set model is able to analyze numerical data containing uncertainty and approximate description, and grasshopper optimization algorithm can solve optimization problem with strong exploration and development capabilities. Multi-kernelized fuzzy rough set model was introduced into hyperspectral uncertainty analysis modeling, grasshopper optimization algorithm was used to select the subset of bands, then a hyperspectral band selection algorithm based on multi-kernel fuzzy rough set and grasshopper optimization algorithm was proposed. Firstly, the multi-kernel operator was used to measure the similarity in order to improve the adaptability of the model to data distribution. The correlation measure of bands based on the kernel fuzzy rough set was determined, and the correlation between bands was measured by the lower approximate distribution of ground objects at different pixel points in fuzzy rough set. Then, the band dependence, band information entropy and band correlation were considered comprehensively to define the fitness function of band subset. Finally, with J48 and K-Nearest Neighbor ( KNN) adopted as the classifier algorithms, the proposed algorithm was compared with Band Correlation Analysis (BCA) and Normalized Mutual Information (NMI) algorithms in the classification performance on a common hyperspectral dataset Indiana Pines agricultural area. The experimental results show that the proposed algorithm has the overall average classification accuracy increased by 2.46 and 1.54 percentage points respectively when fewer bands are selected.
The problem of misclassification of minority class samples appears frequently when classifying massive amount of imbalanced data in real life with traditional classification algorithms, because most of these algorithms only suit balanced class distribution or samples with same misclassification cost. To overcome this problem, a classification algorithm for imbalanced dataset based on cost sensitive ensemble learning and oversampling-New Imbalanced Boost (NIBoost) was proposed. Firstly, the oversampling algorithm was used to add a certain number of minority samples to balance the dataset in each iteration, and the classifier was trained on the new dataset. Secondly, the classifier was used to classify the dataset to obtain the predicted class label of each sample and the classification error rate of the classifier. Finally, the weight coefficient of the classifier and new weight of each sample were calculated according to the classification error rate and the predicted class labeles. Experimental results on UCI datasets with decision tree and Naive Bayesian used as weak classifier algorithm show that when decision tree was used as the base classifier of NIBoost, compared with RareBoost algorithm, the F-value is increased up to 5.91 percentage points, the G-mean is increased up to 7.44 percentage points, and the AUC is increased up to 4.38 percentage points. The experimental results show that the proposed algorithm has advantages on imbalanced data classification problem.
A spatial co-location pattern represents a subset of spatial features whose instances are frequently located together in spatial neighborhoods. The existing interesting metrics for spatial co-location pattern mining do not take account of the difference between features and the diversity between instances belonging to the same feature. In addition, using the traditional data-driven spatial co-location pattern mining method, the mining results often contain a lot of useless or uninteresting patterns. In view of the above problems, firstly, a more general study object-spatial instance with utility value was proposed, and the Utility Participation Index (UPI) was defined as the new interesting metric of the spatial high utility co-location patterns. Secondly, the domain knowledge was formalized into three kinds of semantic rules and applied to the mining process, and a new domain-driven iterative mining framework was put forward. Finally, by the extensive experiments, the differences between mined results with different interesting metrics were compared in two aspects of utility ratio and frequency, as well as the changes of the mining results after taking the domain knowledge into account. Experimental results show that the proposed UPI metric is a more reasonable measure in consideration of both frequency and utility, and the domain-driven mining method can effectively find the co-location patterns that users are really interested in.